feat: MinorTraceChemistry backfill job (#600)#640
feat: MinorTraceChemistry backfill job (#600)#640
Conversation
Add chemistry backfill that maps legacy NMA_Radionuclides rows into Observation records keyed on nma_pk_chemistryresults (idempotent upsert). - Schema: add chemistry columns to Observation and Sample models/schemas - Migration: add columns + unique constraints, drop stale NMA_Radionuclides.thing_id - Lexicon: add pCi/L unit and Chemistry Observation note type - Backfill: resolve Samples via nma_pk_chemistrysample, create Parameters, AnalysisMethods, Notes; detect_flag from legacy Symbol qualifier - Tests: BDD step definitions for all 7 scenarios, feature file fixes - Orchestrator: wire radionuclides step into backfill pipeline Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap cleanup chain in try/except to prevent orphan data on failure (#566) - Scope AnalysisMethod cleanup to test-created IDs only (#567) Also addresses #565 (scalar_one hardening) and #568 (remove unused parameter_ids tracker) which were applied to the new files in the preceding feature commit. Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add per-row savepoint isolation so one bad row doesn't abort the run - Skip rows with missing AnalysisDate instead of inserting epoch sentinel - Remove unused batch_size parameter from backfill signatures and CLI - Fix migration downgrade to backfill thing_id before setting NOT NULL - Track analysis_method_ids in test step defs for scoped cleanup Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…leanup - Migration upgrade now checks information_schema before dropping thing_id column, guarding against environments where it was already removed - run_backfill.sh no longer references removed --batch-size parameter - Test observation cleanup scoped to scenario sample_ids instead of blanket nma_pk_chemistryresults IS NOT NULL query Refs #558, #570 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tracking - Migration upgrade dynamically discovers FK names via sa.inspect() with guard for missing name entries - Per-row exception handler now logs full traceback via logger.exception() before appending summary to result.errors - Analysis method tracking scoped to scenario sample_ids - Removed incorrect issue reference from test comment Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ow catch - Volume/volume_unit test assertions query by scenario sample_ids - existing_keys set lowercases DB values to match global_id_str normalization - Per-row catch broadened to Exception with logger.exception() for tracebacks - Analysis method tracking scoped to scenario samples - Migration FK drop guarded against missing name entries - Resolve stash merge conflicts in sample volume steps Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Snapshot AnalysisMethod IDs before/after backfill to track only actually-created methods, preventing deletion of pre-existing rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete context._backfill_created in finally block to prevent ID accumulation across scenarios causing redundant deletes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ORDER BY id to legacy query so lowest-PK row wins when multiple radionuclide rows map to the same Sample (volume/volume_unit) - Skip overwrites with warning when a conflict is detected - Reject unknown CLI args in backfill entrypoint instead of silently ignoring them Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntegrationGroup/OcotilloAPI into 558-radionuclides-backfill
Skip pg_cron extension creation gracefully when unavailable instead of hard-failing, unblocking local dev and test environments. Refs #576 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip pg_cron extension creation gracefully when unavailable instead of hard-failing, unblocking local dev and test environments. Also skip search vector trigger sync for tables that don't exist yet (e.g. asset) to avoid NoSuchTableError during test setup. Refs #576 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntegrationGroup/OcotilloAPI into 558-radionuclides-backfill
Resolves test database connection issue (conftest load_dotenv override) and merges alembic migration heads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add backfill logic and test scenarios for migrating legacy NMA_MinorTraceChemistry records into the Ocotillo schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an idempotent backfill path for migrating legacy NMA chemistry results (Radionuclides + MinorTraceChemistry) into the Ocotillo Observation/Sample schema, along with BDD coverage and the supporting schema/lexicon/migrations.
Changes:
- Implement chemistry backfill jobs that upsert
Observationrecords keyed bynma_pk_chemistryresultsand populateSamplechemistry audit/volume fields. - Add Behave step definitions + feature updates to validate idempotency, field mapping, orphan skipping, notes creation, and linkage integrity.
- Extend DB models/schemas/lexicon and add Alembic migrations (including a migration-head merge).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| transfers/backfill/chemistry_backfill.py | New backfill implementation for Radionuclides + MinorTraceChemistry into Observation/Sample, including upsert + notes creation. |
| tests/features/steps/chemistry-backfill.py | New reusable Behave steps to create legacy fixtures, run backfills, and assert mappings/linkage. |
| tests/features/nma-chemistry-radionuclides-refactor.feature | Scenario tweaks to ensure required fixtures/fields exist for radionuclides scenarios. |
| tests/features/nma-chemistry-minortracechemistry-refactor.feature | Scenario tweaks to ensure required fixtures/fields exist for minor/trace chemistry scenarios. |
| tests/features/environment.py | Adds per-scenario cleanup for chemistry-backfill-created data to support running without DB rebuild. |
| db/sample.py | Adds nma_pk_chemistrysample, volume, volume_unit to Sample. |
| db/observation.py | Adds nma_pk_chemistryresults, detect_flag, uncertainty, analysis_agency to Observation. |
| schemas/sample.py | Exposes new sample chemistry/audit fields on the API response schema. |
| schemas/observation.py | Exposes new observation chemistry/audit fields on the API response schema. |
| core/lexicon.json | Adds lexicon terms for pCi/L and note type Chemistry Observation. |
| alembic/versions/545a5b77e5e8_add_chemistry_backfill_columns_to_.py | Adds new columns/constraints + updates observation_version; drops stale legacy column. |
| alembic/versions/1f1bdf75e0c0_merge_migration_heads_after_staging_.py | Merge migration heads. |
| run_backfill.sh | Adjusts invocation of the backfill runner script. |
…nalysisDate - Add default_unit param to _get_or_create_parameter so minor trace uses ug/L instead of inheriting pCi/L from radionuclides - Log warning when legacy row has NULL AnalysisDate Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix Mapped types to use `| None` for nullable columns (observation, sample) - Skip rows with NULL AnalysisDate (observation_datetime is NOT NULL) - Default unit to ug/L when legacy row has no Units (unit is NOT NULL) - Restore $@ forwarding in run_backfill.sh - Verify well still exists in DB before reusing in BDD _ensure_well() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 8 radionuclides scenarios were missing AnalysisDate, which the backfill requires. Now 16/16 scenarios pass across both chemistry feature files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix date→datetime promotion: use isinstance checks instead of hasattr(tzinfo) which doesn't work for plain date objects - Use param.default_unit for unit fallback instead of hardcoded ug/L - Parse BDD AnalysisDate as date (not datetime) to match DB column type - Guard after_scenario cleanup to skip DB session when no fixtures created Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…NULL AnalysisDate Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| with session_ctx() as session: | ||
| post_ids = {row[0] for row in session.execute(select(AnalysisMethod.id)).all()} | ||
| new_ids = post_ids - pre_ids | ||
| context._backfill_created["analysis_method_ids"] = list(new_ids) |
There was a problem hiding this comment.
_track_created_analysis_methods overwrites context._backfill_created["analysis_method_ids"] each time a backfill is run. In scenarios that run the job twice (idempotency), the second run will typically create zero new AnalysisMethods and this assignment will clear the IDs from the first run, so after_scenario won’t delete the AnalysisMethod rows created earlier in the scenario. Consider appending/unioning into the existing list instead of replacing it (e.g., extend with new_ids while keeping prior values).
| context._backfill_created["analysis_method_ids"] = list(new_ids) | |
| existing_ids = set(context._backfill_created["analysis_method_ids"]) | |
| context._backfill_created["analysis_method_ids"] = list(existing_ids | new_ids) |
| { | ||
| "categories": [ | ||
| "unit" | ||
| ], |
There was a problem hiding this comment.
ug/L is used as the default unit for MinorTraceChemistry Parameters/Observations, but it is not present in core/lexicon.json (and Parameter.default_unit / Observation.unit are lexicon-backed). This will cause FK/validation failures when running the backfill against a freshly initialized database (outside the BDD test seeding, which inserts ug/L manually). Add a ug/L unit term to the lexicon (or switch to an existing unit term already seeded).
| ], | |
| ], | |
| "term": "ug/L", | |
| "definition": "Micrograms per Liter" | |
| }, | |
| { | |
| "categories": [ | |
| "unit" | |
| ], |
Summary
NMA_MinorTraceChemistryrecords into the Ocotillo schema (Observation,Sample,Parameter,Notestables)Closes #600
Test plan
🤖 Generated with Claude Code